Metabolomics by Wehrens Ron; Salek Reza;

Metabolomics by Wehrens Ron; Salek Reza;

Author:Wehrens, Ron; Salek, Reza;
Language: eng
Format: epub
Publisher: CRC Press LLC


Multiple Testing

The scenario just described to introduce statistical testing is difficult to apply directly in an untargeted metabolomics context. If one is interested in finding what is differentiating red and white roses, it is common to measure up to several thousands of variables (features or metabolites) in a number of roses of both colors and try to identify which variables are different in the two groups, using thousands of tests. These variables are then potential biomarkers. This shift of perspective brings the statistical framework from “one hypothesis testing” to “multiple hypothesis testing,” an apparently small change that has a profound impact on the statistical aspects of the problem. The natural extension of the reasoning just described passes through the definition of a new H0, which now assumes that the observed differences in the means of any of the measured variables are the result of chance alone. To test this hypothesis it is possible to combine the p-values of the single variable tests, but it is clear that the possibility of finding at least one variable which is significantly different “by chance” grows with the number of variables. To better clarify this point, consider a hypothetical experiment where our red and white roses are compared for two independent metabolites (variables), each tested at the 0.05 level. For the first variable there is a 5% chance of wrongly concluding that the two populations are different, and the very same reasoning holds for the second variable. Because of that, and since the two tests are independent, the probability of concluding that the two types of roses are different in at least one variable grows to 9.75% (1 – 0.95 × 0.95). It is easy to imagine the impact of this problem in an untargeted metabolomics experiment dealing with thousands of different experimental variables.

To cope with this multiple testing problem, it is possible to follow two approaches. The first one is based on the idea that if the two types of roses are not different, none of the variables should show a statistically significant difference. To control this family wise error rate (FWER) at a specific level, the more straightforward way is to reduce α proportionally to the number of variables: α/N. In the previous example, to control the FWER at 5% level, single variables should be tested at the 2.5% level.

This leads to the well-known Bonferroni correction, which is appealing for its simplicity, but in experiments with thousands of variables it is extremely strict (to control the FWER at 5% level with 1000 variables, each test should be performed at a 0.05/1000 = 5 × 10−5 level). A more liberal way to deal with multiple testing issues is to relax the previous criterion and accept the presence of some false positives (also named false discoveries) in the list of variables. How to practically achieve the control of the false discovery rate (FDR) has been the subject of extensive statistical research and its treatment goes beyond the scope of this book. For a more



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.